This paper investigates the impact of gender bias in occupation nouns and adjectives in Korean language models on the gender referred to by pronouns, utilizing the Winogender dataset. The experiment was conducted in three ways: Measuring surprisal scores in Korean and English using encoder models, and conducting experiments with a decoder model, namely is ChatGPT-4, to test its responses. The encoder models showed that, regardless of the gender bias differences in occupation nouns and adjectives, the male pronoun was more naturally used in sentences than the female pronoun. On the other hand, the decoder model detected gender bias especially in sentences containing adjectives. This result identifies the influence of gender imbalance in training data and the functional differences between the language generation model and the language comprehension model. This study suggests to construct a unique dataset that reflects the characteristics of Korean, in order to more effectively analyze gender bias in Korean language models.
목차
1. 서론
2. 배경
3. 데이터
4. 실험
5. 한국어 BERTs와 ChatGPT4 비교
6. 공개 라이브러리
7. 결론
참고문헌
자료 출처: DBPia